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Abstract 


Deep neural networks (DNNs) have advanced 
performance on a wide range of complex tasks, 
rapidly outpacing our understanding of the na- 
ture of their solutions. While past work sought 
to advance our understanding of these models, 
none has made use of the rich history of problem 
descriptions, theories, and experimental methods 
developed by cognitive psychologists to study 
the human mind. To explore the potential value 
of these tools, we chose a well-established analy- 
sis from developmental psychology that explains 
how children learn word labels for objects, and 
applied that analysis to DNNs. Using datasets 
of stimuli inspired by the original cognitive psy- 
chology experiments, we find that state-of-the-art 
one shot learning models trained on ImageNet 
exhibit a similar bias to that observed in hu- 
mans: they prefer to categorize objects accord- 
ing to shape rather than color. The magnitude 
of this shape bias varies greatly among archi- 
tecturally identical, but differently seeded mod- 
els, and even fluctuates within seeds through- 
out training, despite nearly equivalent classifi- 
cation performance. These results demonstrate 
the capability of tools from cognitive psychology 
for exposing hidden computational properties of 
DNNs, while concurrently providing us with a 
computational model for human word learning. 


1. Introduction 


During the last half-decade deep learning has significantly 
improved performance on a variety of tasks (for a review, 
see LeCun et al. (2015)). However, deep neural network 
(DNN) solutions remain poorly understood, leaving many 
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to think of these models as black boxes, and to question 
whether they can be understood at all (Bornstein, 2016; 
Lipton, 2016). This opacity obstructs both basic research 
seeking to improve these models, and applications of these 
models to real world problems (Caruana et al., 2015). 


Recent pushes have aimed to better understand DNNs: 
tailor-made loss functions and architectures produce more 
interpretable features (Higgins et al., 2016; Raposo et al., 
2017) while output-behavior analyses unveil previously 
Opaque operations of these networks (Karpathy et al., 
2015). Parallel to this work, neuroscience-inspired meth- 
ods such as activation visualization (Li et al., 2015), abla- 
tion analysis (Zeiler & Fergus, 2014) and activation maxi- 
mization (Yosinski et al., 2015) have also been applied. 


Altogether, this line of research developed a set of promis- 
ing tools for understanding DNNs, each paper producing 
a glimmer of insight. Here, we propose another tool for 
the kit, leveraging methods inspired not by neuroscience, 
but instead by psychology. Cognitive psychologists have 
long wrestled with the problem of understanding another 
opaque intelligent system: the human mind. We contend 
that the search for a better understanding of DNNs may 
profit from the rich heritage of problem descriptions, the- 
ories, and experimental tools developed in cognitive psy- 
chology. To test this belief, we performed a proof-of- 
concept study on state-of-the-art DNNs that solve a par- 
ticularly challenging task: one-shot word learning. Specif- 
ically, we investigate Matching Networks (MNs) (Vinyals 
et al., 2016), which have state-of-the-art one-shot learning 
performance on ImageNet and we investigate an Inception 
Baseline model (Szegedy et al., 2015a). 


Following the approach used in cognitive psychology, we 
began by hypothesizing an inductive bias our model may 
use to solve a word learning task. Research in develop- 
mental psychology shows that when learning new words, 
humans tend to assign the same name to similarly shaped 
items rather than to items with similar color, texture, or 
size. To test the hypothesis that our DNNs discover this 
same “shape bias”, we probed our models using datasets 
and an experimental setup based on the original shape bias 
studies (Landau et al., 1988). 
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Our results are as follows: 1) Inception networks trained on 
ImageNet do indeed display a strong shape bias. 2) There 
is high variance in the bias between Inception networks 
initialized with different random seeds, demonstrating that 
otherwise identical networks converge to qualitatively dif- 
ferent solutions. 3) MNs also have a strong shape bias, and 
this bias closely mimics the bias of the Inception model 
that provides input to the MN. 4) By emulating the shape 
bias observed in children, these models provide a candidate 
computational account for human one-shot word learning. 
Altogether, these results show that the technique of testing 
hypothesized biases using probe datasets can yield both ex- 
pected and surprising insights about solutions discovered 
by trained DNNs!. 


2. Inductive Biases, Statistical Learners and 
Probe Datasets 


Before we delve into the specifics of the shape bias and 
one-shot word learning, we will describe our approach in 
the general context of inductive biases, probe datasets, and 
statistical learning. Suppose we have some data {y;, x;}*_, 
where y; = f(a;). Our goal is to build a model of the 
data g(.) to optimize some loss function L measuring the 
disparity between y and g(x), e.g., L = >; ||yi — 9(2i)||?. 
Perhaps this data x is images of ImageNet objects to be 
classified, images and histology of tumors to be classified 
as benign or malignant (Kourou et al., 2015), or medical 
history and vital measurements to be classified according 
to likely pneumonia outcomes (Caruana et al., 2015). 


A Statistical learner such as a DNN will minimize L by 
discovering properties of the input x that are predictive of 
the labels y. These discovered predictive properties are, in 
effect, the properties of x for which the trained model has 
an inductive bias. Examples of such properties include the 
shape of ImageNet objects, the number of nodes of a tumor, 
or a particular constellation of blood test values that often 
precedes an exacerbation of pneumonia symptoms. 


Critically, in real-world datasets such as these, the discov- 
ered properties are unlikely to correspond to a single fea- 
ture of the input x; instead they correspond to complex 
conjunctions of those features. We could describe one of 
these properties using a function h(a), which, for example, 
returns the shape of the focal object given an ImageNet im- 
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age, or the number of nodes given a scan of tumor. Indeed, 
one way to articulate the difficulty in understanding DNNs 
is to say that we often can’t intuitively describe these con- 
junctions of features h(x); although we often have numeri- 
cal representations in intermediate DNN layers, they’re of- 
ten too arcane for us to interpret. 


We advocate for addressing this problem using the follow- 
ing hypothesis-driven approach: First, propose a property 
h,(a) that the model may be using. Critically, it’s not nec- 
essary that h,(x) be a function that can be evaluated using 
an automated method. Instead, the intention is that h, (x) 
is a function that humans (e.g. ML researchers and practi- 
tioners) can intuitively evaluate. h (x) should be a prop- 
erty that is believed to be relevant to the problem, such as 
object shape or number of tumor nodes. 


After proposing a property, the next step is to generate pre- 
dictions about how the model should behave when given 
various inputs, if in fact it uses a bias with respect to the 
property h,(a). Then, construct and carry out an experi- 
ment wherein those predictions are tested. In order to ex- 
ecute such an experiment, it typically will be necessary to 
craft a set of probe examples x that cover a relevant por- 
tion of the range of h,(x), for example a variety of object 
shapes. The results of this experiment will either support 
or fail to support the hypothesis that the model uses h,(2) 
to solve the task. This process can be especially valuable in 
situations where there is little or no training data available 
in important regions of the input space, and a practitioner 
needs to know how the trained model will behave in that 
region. 


Psychologists have developed a repertoire of such hypothe- 
ses and experiments in their effort to understand the hu- 
man mind. Here we explore the application of one of these 
theory-experiment pairs to state of the art one-shot learning 
models. We will begin by describing the historical back- 
drop for the human one-shot word learning experiments 
that we will then apply to our DNNs. 


3. The problem of word learning; the solution 
of inductive biases 


Discussions of one-shot word learning in the psychologi- 
cal literature inevitably begin with the philosopher W.V.O. 
Quine, who broke this problem down and described one 
of its most computationally challenging components: there 
are an enormous number of tenable hypotheses that a 
learner can use to explain a single observed example. To 
make this point, Quine penned his now-famous parable of 
the field linguist who has gone to visit a culture whose lan- 
guage is entirely different from our own (Quine, 1960). 
The linguist is trying to learn some words from a helpful 
native, when a rabbit runs past. The native declares “gava- 
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gai”, and the linguist is left to infer the meaning of this new 
word. Quine points out that the linguist is faced with an 
abundance of possible inferences, including that “gavagai”’ 
refers to rabbits, animals, white things, that specific rabbit, 
or “undetached parts of rabbits”. Quine argues that indeed 
there is an infinity of possible inferences to be made, and 
uses this conclusion to bolster the assertion that meaning 
itself cannot be defined in terms of internal mental events”. 


Contrary to Quine’s intentions, when this example was in- 
troduced to the developmental psychology community by 
Macnamara (1972), it spurred them not to give up on the 
idea of internal meaning, but instead to posit and test for 
cognitive biases that enable children to eliminate broad 
swaths of the hypothesis space (Bloom, 2000). A variety of 
hypothesis-eliminating biases were then proposed includ- 
ing the whole object bias, by which children assume that 
a word refers to an entire object and not its components 
(Markman, 1990); the taxonomic bias, by which children 
assume a word refers to the basic level category an object 
belongs to (Markman & Hutchinson, 1984); the mutual ex- 
clusivity bias, by which children assume that a word only 
refers to one object category (Markman & Wachtel, 1988); 
the shape bias, with which we are concerned here (Landau 
et al., 1988); and a variety of others (Bloom, 2000). These 
biases were tested empirically in experiments wherein chil- 
dren or adults were given an object (or picture of an ob- 
ject) along with a novel name, then were asked whether the 
name should apply to various other objects. 


Taken as a whole, this work yielded a computational level 
(Marr, 1982) account of word learning whereby people 
make use of biases to eliminate unlikely hypotheses when 
inferring the meaning of new words. Other contrasting 
and complementary approaches to explaining word learn- 
ing exist in the psychological literature, including associa- 
tion learning (Regier, 1996; Colunga & Smith, 2005) and 
Bayesian inference (Xu & Tenenbaum, 2007). We leave 
the application of these theories to deep learning models 
to future work, and focus on determining what insight can 
be gained by applying a hypothesis elimination theory and 
methodology. 


We begin the present work with the knowledge that part 
of the hypothesis elimination theory is correct: the models 
surely use some kind of inductive biases since they are sta- 
tistical learning machines that successfully model the map- 
ping between images and object labels. However, several 
questions remain open. What predictive properties did our 
DNNs find? Do all of them find the same properties? Are 
any of those properties interpretable to humans? Are they 
the same properties that children use? How do these biases 
change over the course of training? 


To address these questions, we carry out experiments anal- 
ogous to those of Landau et al. (1988). This enables us to 


test whether the shape bias — a human interpretable feature 
used by children when learning language — is visible in the 
behavior of MNs and Inception networks. Furthermore we 
are able to test whether these two models, as well as differ- 
ent instances of each of them, display the same bias. In the 
next section we will describe in detail the one-shot word 
learning problem, and the MNs and Inception networks we 
use to solve it. 


4. One-shot word learning models and 
training 
4.1. One-shot word learning task 


The one-shot word learning task is to label a novel data ex- 
ample ĉ (e.g. a novel probe image) with a novel class label 
y (e.g. anew word) after only a single example. More 
specifically, given a support set S = {(a;, yi) : i € [1, k]}, 
of images x; and their associated labels y;, and an unla- 
belled probe image ĉ, the one-shot learning task is to iden- 
tify the true label of the probe image 7 from the support set 
labels {y; : i € [1, k]}: 


g = arg max P(y|#, S). (1) 
y 


We assume that the image labels y; are represented using a 
one-hot encoding and that P(y|ĉ, S) is parameterised by a 
DNN, allowing us to leverage the ability of deep networks 
to learn powerful representations. 


4.2. Inception: baseline one-shot learning model 


In our simplest baseline one-shot architecture, a probe im- 
age ĉ is given the label of the nearest neighbour from the 
support set: 


y= 
(x,y) = E, min d(h(a;), h(ê£)) (2) 


Li yileS 

where d is a distance function. The function h is parame- 
terised by Inception — one of the best performing ImageNet 
classification models (Szegedy et al., 2015a). Specifically, 
h returns features from the last layer (the softmax input) of 
a pre-trained Inception classifier, where the Inception clas- 
sifier is trained using rms-prop, as described in Szegedy 
et al. (2015b), section 8. With these features as input and 
cosine distance as the distance function, the classifier in 
equation 2 achieves 87.6% accuracy on one-shot classifica- 
tion on the ImageNet dataset (Vinyals et al., 2016). Hence- 
forth, we call the Inception classifier together with the 
nearest-neighbor component the Inception Baseline (IB) 
model. 


Unlike Quine, we use a pragmatic definition of meaning - a 
human or model understands the meaning of a word if they assign 
that word to new instances of objects in the correct category. 
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4.3. Matching Nets model architecture and training 


We also investigate a state-of-the-art one-shot learning 
architecture called Matching Nets (MN) (Vinyals et al., 
2016). MNs are a fully differentiable neural network archi- 
tecture with state-of-the-art one shot learning performance 
on ImageNet (93.2% one-shot labelling accuracy). 


MNs are trained to assign label ĝ to probe image ĉ accord- 
ing to equation | using an attention mechanism a acting on 
image embeddings stored in the support set S: 


elf (@,8),9(@,5)) 
So, IES) eS) 


a(&, zi) = 


(3) 


where d is a cosine distance and where f and g provide 
context-dependent embeddings of ĉ and x; (with context 
S). The embedding g(x;, S) is a bi-directional LSTM 
(Hochreiter & Schmidhuber, 1997) with the support set S 
provided as an input sequence. The embedding f(ĉ, S) is 
an LSTM with a read-attention mechanism operating over 
the entire embedded support set. The input to the LSTM 
is given by the penultimate layer features of a pre-trained 
deep convolutional network, specifically Inception, as in 
our baseline IB model described above (Szegedy et al., 
2015a). 


The training procedure for the one-shot learning task is crit- 
ical if we want MNs to classify a probe image ĉ after view- 
ing only a single example of this new image class in its 
support set (Hochreiter et al., 2001; Santoro et al., 2016). 


To train MNs we proceed as follows: (1) At each step of 
training, the model is given a small support set of images 
and associated labels. In addition to the support set, the 
model is fed an unlabelled probe image ĉ; (2) The model 
parameters are then updated to improve classification ac- 
curacy of the probe image ĉ given the support set. Pa- 
rameters are updated using stochastic gradient descent with 
a learning rate of 0.1; (3) After each update, the labels 
{yi : i € [1, k]} in the training set are randomly re-assigned 
to new image classes (the label indices are randomly per- 
muted, but the image labels are not changed). This is a 
critical step. It prevents MNs from learning a consistent 
mapping between a category and a label. Usually, in clas- 
sification, this is what we want, but in one-shot learning we 
want to train our model for classification after viewing a 
single in-class example from the support set. Formally, our 
objective function is: 


Š log P(yla,S)} | 4) 


(x, y)EB 


L=Ecwr |Esxc,Bxc 


where T is the set of all possible labelings of our classes, S 
is a support set sampled with a class labelling C ~ T and 
B is a batch of probe images and labels, also with the same 
randomly chosen class labelling as the support set. 


Next we will describe the probe datasets we used to test for 
the shape bias in the IB and MNs after ImageNet training. 


5. Data for bias discovery 
5.1. Cognitive Psychology Probe Data 


The Cognitive Psychology Probe Data (CogPsyc data) that 
we use consists of 150 images of objects (Figure 1). The 
images are arranged in triples consisting of a probe im- 
age, a Shape-match image (that matches the probe in colour 
but not shape), and a color-match image (that matches the 
probe in shape but not colour). In the dataset there are 10 
triples, each shown on 5 different backgrounds, giving a 
total of 50 triples.* 


The images were generously provided by cognitive psy- 
chologist Linda Smith. The images are photographs of 
stimuli used previously in shape bias experiments con- 
ducted in the Cognitive Development Lab at Indiana Uni- 
versity. The potentially confounding variables of back- 
ground content and object size are controlled in this dataset. 


5.2. Probe Data from the wild 


We have also assembled a real-world dataset consisting of 
90 images of objects (30 triples) collected using Google 
Image Search. Again, the images are arranged in triples 
consisting of a probe, a shape-match and a colour-match. 
For the probe image, we chose images of real objects that 
are unlikely to appear in standard image datasets such as 
ImageNet. In this way, our data contains the irregularity 
of the real world while also probing our models’ properties 
outside of the image space covered in our training data. For 
the shape-match image, we chose an object with a similar 
shape (but with a very different colour), and for the colour- 
match image, we chose an object with a similar colour (but 
with a very different shape). For example, one triple con- 
sists of a silver tuning fork as the probe, a silver guitar capo 
as the colour match, and a black tuning fork as the shape 
match. Each photo in the dataset contains a single object 
on a white background. 


We collected this data to strengthen our confidence in the 
results obtained for the CogPsych dataset and to demon- 
strate the ease with which such probe datasets can be con- 
structed. One of the authors crafted this dataset solely us- 
ing Google Image Search in the span of roughly two days’ 
work. Our results with this dataset, especially the fact that 
the bias pattern over time matches the results from the well 
established CogPsych dataset, support the contention that 
DNN practitioners can collect effective probe datasets with 
minimal time expenditure using readily available tools. 


3 The CogPsyc dataset is available at http://www. 
indiana.edu/~cogdev/SB_testsets.html 
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Figure 1.Example images from the Cognitive Psychology 
Dataset (see section 5). The data consists of image triples (rows), 
each containing a colour match image (left column), a shape 
match image (middle column) and a probe image (right column). 
We use these triples to calculate the shape bias by reporting the 
proportion of times that a model assigns the shape match image 
class to the probe image. This dataset was supplied by cognitive 
psychologist Linda Smith, and was designed to control for object 
size and background. 


6. Results 
6.1. Shape bias in the Inception Baseline Model 


First, we measured the shape bias in IB: we used a pre- 
trained Inception classifier (with 94% top-5 accuracy) to 
provide features for our nearest-neighbour one-shot clas- 
sifier, and probed the model using the CogPsyc dataset. 
Specifically, for a given probe image ĉ, we loaded the 
shape-match image x, and corresponding label ys, along 
with the colour-match image x, and corresponding label 
Yc into memory, as the support set S = { (£s, Ys), (Le, Yc) }- 
We then calculated y using Equation 2. Our model assigned 
either ye or ys to the probe image. To estimate the shape 
bias B., we calculated the proportion of shape labels as- 
signed to the probe: 


Bs = E(ô(ĝ — ys)), (5) 


where F is an expectation across probe images and ô is the 
Dirac delta function. 


We ran all IB experiments using both Euclidean and cosine 
distance as the distance function. We found that the results 
for the two distance functions were qualitatively similar, so 
we only report results for Euclidean distance. 


We found the shape bias of IB to be B, = 0.68. Simi- 
larly, the shape bias of IB using our real-world dataset was 
B, = 0.97. Together, these results strongly suggest that IB 


trained on ImageNet has a stronger bias towards shape than 
colour. 


Note that, as expected, the shape bias of this model is qual- 
itatively similar across datasets while being quantitatively 
different - largely because the datasets themselves are quite 
different. Indeed, the datasets were chosen to be quite 
different so that we could explore a broad space of pos- 
sibilities. In particular, our CogPsyc dataset backgrounds 
have much larger variability than our real-world dataset 
backgrounds, and our real-world dataset objects have much 
greater variability than the CogPsyc dataset objects. 


6.2. Shape bias in the Matching Nets Model 


Next, we probed the MNs using a similar procedure. We 
used the IB trained in the previous section to provide the 
input features for the MN as described in section 4.3. 
Then, following the training procedure outlined in section 
4.3 we trained MNs for one-shot word learning on Ima- 
geNet, achieving state-of-the-art performance, as reported 
in (Vinyals et al., 2016). Then, repeating the analysis 
above, we found that MNs have a shape of bias B, = 0.7 
using our CogPsyc dataset and a bias of B, = 1 using the 
real-world dataset. It is interesting to note that these bias 
values are very similar to the IB bias values. 


6.3. Shape bias statistics: within models and across 
models 


The observation of a shape bias immediately raises some 
important questions. In particular: (1) Does this bias de- 
pend on the initial values of the parameters in our model? 
(2) Does the size of the shape bias depend on model perfor- 
mance? (3) When does shape bias emerge during training 
- before model convergence or afterwards? (4) How does 
shape bias compare between models, and within models? 


To answer these questions, we extended the shape bias 
analysis described above to calculate the shape bias in a 
population of IB models and in a population of MN models 
with different random initialization (Figs. 2 and 5). 


(1) We first calculated the dependence of shape bias on the 
initialization of IB (Fig. 2). Surprisingly, we observed a 
strong variability, depending on the initialization. For the 
CogPsyc dataset, the average shape bias was B, = 0.628 
with standard deviation og, = 0.049 at the end of training 
and for the real-world dataset the average shape bias was 


B, = 0.958 with op, = 0.037. 

(2) Next, we calculated the dependence of shape bias on 
model performance. For the CogPsych dataset, the corre- 
lation between bias and classification accuracy was p = 
0.15, with tn=15 = 0.55, Ponetait = 0.29, and for the 
real-world dataset, the correlation was p = —0.06 with 
tn=15 = —0.22, Ponetait = 0.42. Therefore, fluctuations 
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Figure 2. Shape bias across models with different initialization seeds, and within models during training calculated using the CogPsyc 
dataset. (a) The shape bias B, of 15 Inception models is calculated throughout training (yellow lines). A strong shape bias emerges 
across all models. A bias value Bs > 0.5 indicates a shape bias and Bs < 0.5 indicates a colour bias. Two examples are highlighted here 
(blue and red lines) for clarity. (b) The shape bias fluctuates strongly within models during training by up to three standard deviations. 
(c) The distribution of bias values, calculated at the start (blue), middle (red) and end (yellow) of training. Bias variability is high at the 
start and end of training. Here, these distributions are calculated using kernel density estimates from all shape bias measurements from 


all models within the indicated window. 
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Figure 3. Classification accuracy of all 15 Inception models eval- 
uated on a test set during training on ImageNet (same models as in 
Figure 2) . All 15 Inception network seeds achieve near identical 
test accuracy (overlapping yellow lines). 


in the bias cannot be accounted for by fluctuations in clas- 
sification accuracy. This is not surprising, because the clas- 
sification accuracy of all models was similar at the end of 
training, while the shape bias was variable. This demon- 
strates that models can have variable behaviour along im- 
portant dimensions (e.g., bias) while having the same per- 
formance measured by another (e.g., accuracy). 


(3) Next we explored the emergence of the shape bias dur- 
ing training (Fig. 2a,c; Fig. 5a,c). At the start of train- 
ing, the average shape bias of these models was B, = 
0.448 with standard deviation og, = 0.0835 on the Cog- 
Psyc dataset and B, = 0.593 with og, = 0.073 on the 
real-world dataset. We observe that a shape bias began 
to emerge very early during training, long before conver- 
gence. 


(4) Finally, we compare shape bias within models during 
training, and between models at the end of training. Dur- 
ing training, the shape bias within IB fluctuates signifi- 
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Figure 4. Scatter plot showing Matching Network (MN) bias as 
a function of Inception bias. Each MN receives input through 
an Inception model. Each point in this scatter plot is the bias 
of a MN and the bias of the Inception model providing input to 
that particular MN. In total, the bias values of 45 MN models are 
plotted (some dots are overlapping). 


cantly (Fig. 2 b; Fig. 5b). In contrast, the shape bias does 
not fluctuate during training of the MN. Instead, the MN 
model inherits its shape bias characteristics at the start of 
training from the IB that provides it with input embeddings 
(Fig. 4) and this shape-bias remains constant throughout 
training. Moreover, there is no evidence that the MN and 
corresponding IB bias values are different from each other 
(paired t-test, p = 0.167). Note that we do not fine-tune the 
Inception model providing input while training the MN. We 
do this so that we can observe the shape-bias properties of 
the MN independent of the IB model properties. 
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Figure 5. Shape bias across models with different initialization seeds, and within models during training calculated using the real-world 
dataset. (a) The shape bias Bs of 15 Inception models is calculated throughout training (yellow lines). A strong shape bias emerges 
across all models. Two examples are highlighted here (blue and red lines) for clarity. (b) The shape bias fluctuates strongly within 
models during training. (c) The distribution of bias values, calculated at the start (blue), middle (red) and end (yellow) of training. Bias 


variability is high at the start and end of training. 


7. Discussion 
7.1. A shape bias case study 


Our psychology-inspired approach to understanding DNNs 
produced a number of insights. Firstly, we found that both 
IB and MNs trained on ImageNet display a strong shape 
bias. This is an important result for practitioners who rou- 
tinely use these models - especially for applications where 
it is known a priori that colour is more important than 
shape. As an illustrative example, if a practitioner planned 
to build a one-shot fruit classification system, they should 
proceed with caution if they plan to use pre-trained Ima- 
geNet models like Inception and MNs because fruit are of- 
ten defined according to colour features rather than shape. 
In applications where a shape bias is desirable (as is more 
often the case than not), this result provides reassurance 
that the models are behaving sensibly in the presence of 
ambiguity. 


The second surprising finding was the large variability in 
shape bias, both within models during training and across 
models, depending on the randomly chosen initialisation 
of our model. This variability can arise because our mod- 
els are not being explicitly optimised for shape biased cate- 
gorisation. This is an important result because it shows that 
not all models are created equally - some models will have 
a stronger preference for shape than others, even though 
they are architecturally identical and have almost identical 
classification accuracy. 


Our third finding — that MNs retain the shape bias statis- 
tics of the downstream Inception network — demonstrates 
the possibility for biases to propagate across model com- 
ponents. In this case, the shape bias propagates from the 
Inception model through to the MN memory modules. This 
result is yet another cautionary observation; when combin- 


ing multiple modules together, we must be aware of con- 
tamination by unknown properties across modules. Indeed, 
a bias that is benign in one module might only have a detri- 
mental effect when combined later with other modules. 


A natural question immediately arises from these results - 
how can we remove an unwanted bias or induce a desir- 
able bias? The biases under consideration are properties of 
an architecture and dataset synthesized together by an op- 
timization procedure. As such, the observation of a shape- 
bias is partly a result of the statistics of natural image- 
labellings as captured in the ImageNet dataset, and partly a 
result of the architecture attempting to extract these statis- 
tics. Therefore, on discovering an unwanted bias, a practi- 
tioner can either attempt to change the model architecture 
to explicitly prevent the bias from emerging, or, they can at- 
tempt to manipulate the training data. If neither of these are 
possible - for example, if the appropriate data manipulation 
is too expensive, or, if the bias cannot be easily suppressed 
in the architecture, it may be possible to do zero-th order 
optimization of the models. For example, one may perform 
post-hoc model selection either using early stopping or by 
selecting a suitable model from the set of initial seeds. 


An important caveat to note is that behavioral tools often 
do not provide insight into the neural mechanisms. In our 
case, the DNN mechanism whereby model parameters and 
input images interact to give rise to a shape bias have not 
been elucidated, nor did we expect this to happen. Indeed, 
just as cognitive psychology often does for neuroscience, 
our new computational level insights can provide a starting 
point for research at the mechanistic level. For example, 
in future work it would be interesting to use gradient-based 
visualization or neuron ablation techniques to augment the 
current results by identifying the mechanisms underlying 
the shape bias. The convergence of evidence from such 
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introspective methods with the current behavioral method 
would create a richer account of these models’ solutions to 
the one-shot word learning problem. 


7.2. Modelling human word learning 


There have been previous attempts to model human word 
learning in the cognitive science literature (Colunga & 
Smith, 2005; Xu & Tenenbaum, 2007; Schilling et al., 
2012; Mayor & Plunkett, 2010). However, none of these 
models are capable of one-shot word learning on the scale 
of real-world images. Because MNs both solve the task 
at scale and emulate hallmark experimental findings, we 
propose MNs as a computational-level account of human 
one-shot word learning. Another feature of our results sup- 
ports this contention: in our model the shape bias increases 
dramatically early in training (Fig. 2a); similarly, humans 
show the shape bias much more strongly as adults than as 
children, and older children show the bias more strongly 
than younger children (Landau et al., 1988). 


As a good cognitive model should, our DNNs make testable 
predictions about word-learning in humans. Specifically, 
the current results predict that the shape bias should vary 
across subjects as well as within a subject over the course 
of development. They also predict that for humans with 
adult-level one-shot word learning abilities, there should 
be no correlation between shape bias magnitude and one- 
shot-word learning capability. 


Another promising direction for future cognitive research 
would be to probe MNs for additional biases in order to 
predict novel computational properties in humans. Probing 
a model in this way is much faster than running human be- 
havioural experiments, so a wider range of hypotheses for 
human word learning may be rapidly tested. 


7.3. Cognitive Psychology for Deep Neural Networks 


Through the one-shot learning case study, we demonstrated 
the utility of leveraging techniques from cognitive psy- 
chology for understanding the computational properties of 
DNNs. There is a wide ranging literature in cognitive psy- 
chology describing techniques for probing a spectrum of 
behaviours in humans. Our work here leads the way to the 
study of artificial cognitive psychology - the application of 
these techniques to better understand DNNs. 


For example, it would be useful to apply work from the 
massive literature on episodic memory (Tulving, 1985) to 
the recent flurry of episodic memory architectures (Blun- 
dell et al., 2016; Graves et al., 2016), and to apply tech- 
niques from the semantic cognition literature (Lamberts & 
Shanks, 2013) to recent models of concept formation (Hig- 
gins et al., 2016; Gregor et al., 2016; Raposo et al., 2017). 
More generally, the rich psychological literature will be- 


come increasingly useful for understanding deep reinforce- 
ment learning agents as they learn to solve increasingly 
complex tasks. 


8. Conclusion 


In this work, we have demonstrated how techniques from 
cognitive psychology can be leveraged to help us better un- 
derstand DNNs. As a case study, we measured the shape 
bias in two powerful yet poorly understood DNNs - Incep- 
tion and MNs. Our analysis revealed previously unknown 
properties of these models. More generally, our work leads 
the way for future exploration of DNNs using the rich body 
of techniques developed in cognitive psychology. 
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